29 research outputs found

    Nonnegative principal component analysis for mass spectral serum profiles and biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>As a novel cancer diagnostic paradigm, mass spectroscopic serum proteomic pattern diagnostics was reported superior to the conventional serologic cancer biomarkers. However, its clinical use is not fully validated yet. An important factor to prevent this young technology to become a mainstream cancer diagnostic paradigm is that robustly identifying cancer molecular patterns from high-dimensional protein expression data is still a challenge in machine learning and oncology research. As a well-established dimension reduction technique, PCA is widely integrated in pattern recognition analysis to discover cancer molecular patterns. However, its global feature selection mechanism prevents it from capturing local features. This may lead to difficulty in achieving high-performance proteomic pattern discovery, because only features interpreting global data behavior are used to train a learning machine.</p> <p>Methods</p> <p>In this study, we develop a nonnegative principal component analysis algorithm and present a nonnegative principal component analysis based support vector machine algorithm with sparse coding to conduct a high-performance proteomic pattern classification. Moreover, we also propose a nonnegative principal component analysis based filter-wrapper biomarker capturing algorithm for mass spectral serum profiles.</p> <p>Results</p> <p>We demonstrate the superiority of the proposed algorithm by comparison with six peer algorithms on four benchmark datasets. Moreover, we illustrate that nonnegative principal component analysis can be effectively used to capture meaningful biomarkers.</p> <p>Conclusion</p> <p>Our analysis suggests that nonnegative principal component analysis effectively conduct local feature selection for mass spectral profiles and contribute to improving sensitivities and specificities in the following classification, and meaningful biomarker discovery.</p

    Application of multiple statistical tests to enhance mass spectrometry-based biomarker discovery

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry-based biomarker discovery has long been hampered by the difficulty in reconciling lists of discriminatory peaks identified by different laboratories for the same diseases studied. We describe a multi-statistical analysis procedure that combines several independent computational methods. This approach capitalizes on the strengths of each to analyze the same high-resolution mass spectral data set to discover consensus differential mass peaks that should be robust biomarkers for distinguishing between disease states.</p> <p>Results</p> <p>The proposed methodology was applied to a pilot narcolepsy study using logistic regression, hierarchical clustering, t-test, and CART. Consensus, differential mass peaks with high predictive power were identified across three of the four statistical platforms. Based on the diagnostic accuracy measures investigated, the performance of the consensus-peak model was a compromise between logistic regression and CART, which produced better models than hierarchical clustering and t-test. However, consensus peaks confer a higher level of confidence in their ability to distinguish between disease states since they do not represent peaks that are a result of biases to a particular statistical algorithm. Instead, they were selected as differential across differing data distribution assumptions, demonstrating their true discriminatory potential.</p> <p>Conclusion</p> <p>The methodology described here is applicable to any high-resolution MALDI mass spectrometry-derived data set with minimal mass drift which is essential for peak-to-peak comparison studies. Four statistical approaches with differing data distribution assumptions were applied to the same raw data set to obtain consensus peaks that were found to be statistically differential between the two groups compared. These consensus peaks demonstrated high diagnostic accuracy when used to form a predictive model as evaluated by receiver operating characteristics curve analysis. They should demonstrate a higher discriminatory ability as they are not biased to a particular algorithm. Thus, they are prime candidates for downstream identification and validation efforts.</p

    Particle Swarm Optimization with Reinforcement Learning for the Prediction of CpG Islands in the Human Genome

    Get PDF
    BACKGROUND: Regions with abundant GC nucleotides, a high CpG number, and a length greater than 200 bp in a genome are often referred to as CpG islands. These islands are usually located in the 5' end of genes. Recently, several algorithms for the prediction of CpG islands have been proposed. METHODOLOGY/PRINCIPAL FINDINGS: We propose here a new method called CPSORL to predict CpG islands, which consists of a complement particle swarm optimization algorithm combined with reinforcement learning to predict CpG islands more reliably. Several CpG island prediction tools equipped with the sliding window technique have been developed previously. However, the quality of the results seems to rely too much on the choices that are made for the window sizes, and thus these methods leave room for improvement. CONCLUSIONS/SIGNIFICANCE: Experimental results indicate that CPSORL provides results of a higher sensitivity and a higher correlation coefficient in all selected experimental contigs than the other methods it was compared to (CpGIS, CpGcluster, CpGProd and CpGPlot). A higher number of CpG islands were identified in chromosomes 21 and 22 of the human genome than with the other methods from the literature. CPSORL also achieved the highest coverage rate (3.4%). CPSORL is an application for identifying promoter and TSS regions associated with CpG islands in entire human genomic. When compared to CpGcluster, the islands predicted by CPSORL covered a larger region in the TSS (12.2%) and promoter (26.1%) region. If Alu sequences are considered, the islands predicted by CPSORL (Alu) covered a larger TSS (40.5%) and promoter (67.8%) region than CpGIS. Furthermore, CPSORL was used to verify that the average methylation density was 5.33% for CpG islands in the entire human genome

    Accurate peak list extraction from proteomic mass spectra for identification and profiling studies

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Mass spectrometry is an essential technique in proteomics both to identify the proteins of a biological sample and to compare proteomic profiles of different samples. In both cases, the main phase of the data analysis is the procedure to extract the significant features from a mass spectrum. Its final output is the so-called peak list which contains the mass, the charge and the intensity of every detected biomolecule. The main steps of the peak list extraction procedure are usually preprocessing, peak detection, peak selection, charge determination and monoisotoping operation.</p> <p>Results</p> <p>This paper describes an original algorithm for peak list extraction from low and high resolution mass spectra. It has been developed principally to improve the precision of peak extraction in comparison to other reference algorithms. It contains many innovative features among which a sophisticated method for managing the overlapping isotopic distributions.</p> <p>Conclusions</p> <p>The performances of the basic version of the algorithm and of its optional functionalities have been evaluated in this paper on both SELDI-TOF, MALDI-TOF and ESI-FTICR ECD mass spectra. Executable files of MassSpec, a MATLAB implementation of the peak list extraction procedure for Windows and Linux systems, can be downloaded free of charge for nonprofit institutions from the following web site: <url>http://aimed11.unipv.it/MassSpec</url></p

    Role of Sphingosine Kinase 1 and Sphingosine-1-Phosphate Axis in Hepatocellular Carcinoma

    Get PDF
    Hepatocellular carcinoma (HCC) is primarily diagnosed in the latter stages of disease progression and is the third leading cause of cancer deaths worldwide. Thus, there is a need to find biomarkers of early HCC as well as the development of more effective treatments for the disease. Sphingosine-1-phosphate (S1P) is a pleiotropic lipid signaling molecule produced by two isoforms of sphingosine kinase (SphK1 and SphK2) that is involved in regulation of many aspects of mammalian physiology and pathophysiology, including inflammation, epithelial and endothelial barrier function, cancer, and metastasis, among many others. Abundant evidence indicates that SphK1 and S1P promote cancer progression and metastasis in multiple types of cancers. However, the role of SphK/S1P in HCC is less well studied. Here, we review the current state of knowledge of SphKs and S1P in HCC, including evidence for the correlation of SphK1 expression and S1P levels with progression of HCC and negative outcomes, and discuss how this information could lead to the design of more effective diagnostic and treatment modalities for HCC

    Metabolomic profiles of hepatocellular carcinoma in a European prospective cohort

    Get PDF
    Background: Hepatocellular carcinoma (HCC), the most prevalent form of liver cancer, is difficult to diagnose and has limited treatment options with a low survival rate. Aside from a few key risk factors, such as hepatitis, high alcohol consumption, smoking, obesity, and diabetes, there is incomplete etiologic understanding of the disease and little progress in identification of early risk biomarkers. Methods: To address these aspects, an untargeted nuclear magnetic resonance metabolomic approach was applied to pre-diagnostic serum samples obtained from first incident, primary HCC cases (n = 114) and matched controls (n = 222) identified from amongst the participants of a large European prospective cohort. Results: A metabolic pattern associated with HCC risk comprised of perturbations in fatty acid oxidation and amino acid, lipid, and carbohydrate metabolism was observed. Sixteen metabolites of either endogenous or exogenous origin were found to be significantly associated with HCC risk. The influence of hepatitis infection and potential liver damage was assessed, and further analyses were made to distinguish patterns of early or later diagnosis. Conclusion: Our results show clear metabolic alterations from early stages of HCC development with application for better etiologic understanding, prevention, and early detection of this increasingly common cancer.This work was supported by the French National Cancer Institute (L’Institut National du Cancer; INCA; grant number 2009-139; PI: M. Jenab). AF received financial support (BDI fellowship) from the Centre National de la Recherche Scientifique (CNRS) and Bruker Biospin. The coordination of EPIC is financially supported by the European Commission (DG-SANCO) and the International Agency for Research on Cancer. The national cohorts are supported by Danish Cancer Society (Denmark); Ligue Contre le Cancer, Institut Gustave Roussy, Mutuelle Générale de l’Education Nationale, and Institut National de la Santé et de la Recherche Médicale (INSERM) (France); Deutsche Krebshilfe, Deutsches Krebsforschungszentrum (DKFZ), and Federal Ministry of Education and Research (Germany); Hellenic Health Foundation (Greece); Italian Association for Research on Cancer (AIRC), National Research Council, Associazione Italiana per la Ricerca sul Cancro-AIRC-Italy, and AIRE-ONLUS Ragusa, AVIS Ragusa, Sicilian Government (Italy); Dutch Ministry of Public Health, Welfare and Sports (VWS), Netherlands Cancer Registry (NKR), LK Research Funds, Dutch Prevention Funds, Dutch ZON (Zorg Onderzoek Nederland), World Cancer Research Fund (WCRF), and Statistics Netherlands (the Netherlands); European Research Council (ERC; grant number ERC-2009-AdG 232997) and Nordforsk, and Nordic Center of Excellence Programme on Food, Nutrition and Health (Norway); Health Research Fund (FIS), Regional Governments of Andalucía, Asturias, Basque Country, Murcia (No. 6236) and Navarra, and ISCIII RETIC (RD06/0020) (Spain); Swedish Cancer Society, Swedish Scientific Council, and Regional Government of Skåne and Västerbotten (Sweden); Cancer Research UK, Medical Research Council, Stroke Association, British Heart Foundation, Department of Health, Food Standards Agency, and Wellcome Trust (UK)
    corecore